Goto

Collaborating Authors

 clickstream data


A Utility-Mining-Driven Active Learning Approach for Analyzing Clickstream Sequences

arXiv.org Artificial Intelligence

In rapidly evolving e-commerce industry, the capability of selecting high-quality data for model training is essential. This study introduces the High-Utility Sequential Pattern Mining using SHAP values (HUSPM-SHAP) model, a utility mining-based active learning strategy to tackle this challenge. We found that the parameter settings for positive and negative SHAP values impact the model's mining outcomes, introducing a key consideration into the active learning framework. Through extensive experiments aimed at predicting behaviors that do lead to purchases or not, the designed HUSPM-SHAP model demonstrates its superiority across diverse scenarios. The model's ability to mitigate labeling needs while maintaining high predictive performance is highlighted. Our findings demonstrate the model's capability to refine e-commerce data processing, steering towards more streamlined, cost-effective prediction modeling.


TRACE: Transformer-based user Representations from Attributed Clickstream Event sequences

arXiv.org Artificial Intelligence

For users navigating travel e-commerce websites, the process of researching products and making a purchase often results in intricate browsing patterns that span numerous sessions over an extended period of time. The resulting clickstream data chronicle these user journeys and present valuable opportunities to derive insights that can significantly enhance personalized recommendations. We introduce TRACE, a novel transformer-based approach tailored to generate rich user embeddings from live multi-session clickstreams for real-time recommendation applications. Prior works largely focus on single-session product sequences, whereas TRACE leverages site-wide page view sequences spanning multiple user sessions to model long-term engagement. Employing a multi-task learning framework, TRACE captures comprehensive user preferences and intents distilled into low-dimensional representations. We demonstrate TRACE's superior performance over vanilla transformer and LLM-style architectures through extensive experiments on a large-scale travel e-commerce dataset of real user journeys, where the challenges of long page-histories and sparse targets are particularly prevalent. Visualizations of the learned embeddings reveal meaningful clusters corresponding to latent user states and behaviors, highlighting TRACE's potential to enhance recommendation systems by capturing nuanced user interactions and preferences


ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data

arXiv.org Artificial Intelligence

ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data The prediction of student performance and the analysis of students' learning behavior play an important role in enhancing online courses. By analysing a massive amount of clickstream data that captures student behavior, educators can gain valuable insights into the factors that influence academic outcomes and identify areas of improvement in courses. In this study, we developed ClickTree, a tree-based methodology, to predict student performance in mathematical assignments based on students' clickstream data. We extracted a set of features, including problem-level, assignment-level and student-level features, from the extensive clickstream data and trained a CatBoost tree to predict whether a student successfully answers a problem in an assignment. The developed method achieved an AUC of 0.78844 in the Educational Data Mining Cup 2023 and ranked second in the competition. Furthermore, our results indicate that students encounter more difficulties in the problem types that they must select a subset of answers from a given set as well as problem subjects of Algebra II. Additionally, students who performed well in answering end-unit assignment problems engaged more with in-unit assignments and answered more problems correctly, while those who struggled had higher tutoring request rate. The proposed method can be utilized to improve students' learning experiences, and the above insights can be integrated into mathematical courses to enhance students' learning outcomes. In recent years, massive amounts of log data have been collected from students' interactions with online courses, providing researchers with valuable information to analyze student behavior and its impact on academic performance (Yi et al., 2018; Aljohani et al., 2019). By examining clickstream data, educators can gain deeper insights into students' study habits, navigation patterns, and levels of engagement (Wen and Rosé, 2014; Li et al., 2020; Matcha et al., 2020).


Analyzing the Capabilities of Nature-inspired Feature Selection Algorithms in Predicting Student Performance

arXiv.org Artificial Intelligence

Predicting student performance is key in leveraging effective pre-failure interventions for at-risk students. As educational data grows larger, more effective means of analyzing student data in a timely manner are needed in order to provide useful predictions and interventions. In this paper, an analysis was conducted to determine the relative performance of a suite of nature-inspired algorithms in the feature-selection portion of ensemble algorithms used to predict student performance. A Swarm Intelligence ML engine (SIMLe) was developed to run this suite in tandem with a series of traditional ML classification algorithms to analyze three student datasets: instance-based clickstream data, hybrid single-course performance, and student meta-performance when taking multiple courses simultaneously. These results were then compared to previous predictive algorithms and, for all datasets analyzed, it was found that leveraging an ensemble approach using nature-inspired algorithms for feature selection and traditional ML algorithms for classification significantly increased predictive accuracy while also reducing feature set size by up to 65 percent.


MLOps: How to Operationalise E-Commerce Product Recommendation System

#artificialintelligence

One of the most common challenges in an e-commerce business to build a well-performing product recommender and categorisation model. A product recommender is used to recommend similar products to users so that total time and money spent on platform per user will be increased. There is also a need to have a model to categorise products correctly since there might be some wrongly categorised products in those platforms especially where most of content is generated by users as in case of classified websites. A product categorisation model is used to catch those products and place them back into their right categories to improve overall user experience on the platform. This article has 2 main parts.


Clustering and Semi-Supervised Classification for Clickstream Data via Mixture Models

arXiv.org Machine Learning

Finite mixture models have been used for unsupervised learning for some time, and their use within the semi-supervised paradigm is becoming more commonplace. Clickstream data is one of the various emerging data types that demands particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous time Markov models is introduced for unsupervised and semi-supervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated, and compared to the discrete time approach, using simulated and real data.


Solving One of the Biggest Challenges for AI-Based Search Engines: Relevance

#artificialintelligence

Let's learn how to implement ClickModels in order to extract Relevance from clickstream data. These steps tend to be what is already necessary for implementing an effective enough search engine system for a given application. Eventually, the requirement to upgrade the system to deliver customized results may arise. Doing so should be simple. One could choose from a set of machine learning ranking algorithms, train some selected models, prepare them for production and observe the results.


Surveys without Questions: A Reinforcement Learning Approach

arXiv.org Artificial Intelligence

The 'old world' instrument, survey, remains a tool of choice for firms to obtain ratings of satisfaction and experience that customers realize while interacting online with firms. While avenues for survey have evolved from emails and links to pop-ups while browsing, the deficiencies persist. These include - reliance on ratings of very few respondents to infer about all customers' online interactions; failing to capture a customer's interactions over time since the rating is a one-time snapshot; and inability to tie back customers' ratings to specific interactions because ratings provided relate to all interactions. To overcome these deficiencies we extract proxy ratings from clickstream data, typically collected for every customer's online interactions, by developing an approach based on Reinforcement Learning (RL). We introduce a new way to interpret values generated by the value function of RL, as proxy ratings. Our approach does not need any survey data for training. Yet, on validation against actual survey data, proxy ratings yield reasonable performance results. Additionally, we offer a new way to draw insights from values of the value function, which allow associating specific interactions to their proxy ratings. We introduce two new metrics to represent ratings - one, customer-level and the other, aggregate-level for click actions across customers. Both are defined around proportion of all pairwise, successive actions that show increase in proxy ratings. This intuitive customer-level metric enables gauging the dynamics of ratings over time and is a better predictor of purchase than customer ratings from survey. The aggregate-level metric allows pinpointing actions that help or hurt experience. In sum, proxy ratings computed unobtrusively from clickstream, for every action, for each customer, and for every session can offer interpretable and more insightful alternative to surveys.


Predicting Online Item-choice Behavior: A Shape-restricted Regression Perspective

arXiv.org Artificial Intelligence

This paper examines the relationship between user pageview (PV) histories and their item-choice behavior on an e-commerce website. We focus on PV sequences, which represent time series of the number of PVs for each user--item pair. We propose a shape-restricted optimization model that accurately estimates item-choice probabilities for all possible PV sequences. This model imposes monotonicity constraints on item-choice probabilities by exploiting partial orders for PV sequences, according to the recency and frequency of a user's previous PVs. To improve the computational efficiency of our optimization model, we devise efficient algorithms for eliminating all redundant constraints according to the transitivity of the partial orders. Experimental results using real-world clickstream data demonstrate that our method achieves higher prediction performance than that of a state-of-the-art optimization model and common machine learning methods.


Dropout Prediction over Weeks in MOOCs via Interpretable Multi-Layer Representation Learning

arXiv.org Artificial Intelligence

Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy for students to drop out of class. In this paper, our goal is to predict if a learner is going to drop out within the next week, given clickstream data for the current week. To this end, we present a multi-layer representation learning solution based on branch and bound (BB) algorithm, which learns from low-level clickstreams in an unsupervised manner, produces interpretable results, and avoids manual feature engineering. In experiments on Coursera data, we show that our model learns a representation that allows a simple model to perform similarly well to more complex, task-specific models, and how the BB algorithm enables interpretable results. In our analysis of the observed limitations, we discuss promising future directions.